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SERVING CONCURRENT TCP/IP CONNECTIONS OF MULTIPLE 
VIRTUAL INTERNET USERS WITH A SINGLE THREAD 

5 This patent application claims the benefit of priority under 35 U.S.C. 119(e) from U.S. 

Provisional Patent Application Serial No, 60/430,309, filed December 12, 2002, entitled "SERV- 
ING CONCURRENT TCP/IP CONNECTIONS OF MULTIPLE VIRTUAL INTERNET 
USERS WITH A SINGLE THREAD", which is hereby incorporated by reference as if set forth 
in its entirety herein. 

10 Field of the Invention 

This invention relates generally to computing systems, and more specifically to the test- 
ing of servers or other distributed or networked computer systems. 

Background of the Invention 

15 The wide adoption of the Internet, and networked computing in general, has resulted in 

the proliferation of computer servers. A server can be generally defined as a computer that pro- 
vides services to other computers over a network. Among a server's many uses are the distribu- 
tion of web pages, e-mail messages, files, electronic newsgroups and the support of multi-user 
virtual environments. 

20 Evaluating a server's performance will usually depend on the nature of the server and the 

particular purpose it is used for. In the case of the Intemet, one of the primary purposes of an 
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Internet server is to process a large number of requests coming from a large number of different 
computers or users. 

When a server approaches its maximum load it tends to slow down significantly. Thus, 
when a server accepts too many requests from too many users, it increases the time needed to 
process each request. As a result, all users experience degraded quality of service. For this reason 
it is very important for server administrators and network administrators, as well as software de- 
velopers, to be able to choose and configure their servers in a way that enables them to handle 
foreseeable loads without significant service degradation. 

One of the best ways to determine a server's performance capabilities is to test it. Testing 
is usually performed by simulating the environment of users and computers that are meant to be 
served by a server. While a server may be required to serve many thousands of users, using thou- 
sands of computers to simulate these users is usually impracticable. Thus, for testing purposes 
many users are simulated using a single or a small number of testing or simulation computers. 
These simulation computers run testing software, which is designed to simulate many users or 
computers that are making requests to, or generally exchanging information with the server. 
Usually these simulated users are called virtual users. 

Testing software usually utilizes a multitasking Operating System (OS) and CPU. Most 
modem multitasking OS's support both process and thread multitasking. A thread is the basic 
unit of program execution. It includes a list of instructions that is treated by the processor as a 
single sequential flow of control. A process on the other hand is a larger unit of program execu- 
tion, that may contain several threads. At any time a computer may be executing several proc- 
esses concurrently and several threads within each process. 
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In reality, commonly used CPU's do not execute more than one thread or process at the 
same time. They just create that illusion by quickly switching between threads or processes. 
When a CPU switches between two threads or processes, it must execute a context switch, which 
means that it must replace all the data and instructions associated with the old thread or process 
5 with those associated with the new thread or process. Threads, however, have very little data that 
is unique to them, tiiey usually share the process data with the other threads of the same process. 
Thus, switching between different threads in a single process is less resource consuming than 
switching between different processes, since switching between different threads of the same 
process requires the replacement of smaller amoimts of data. So a single application can run sev- 
10 eral tasks as threads concurrently, without inciirring the higher context switching costs of run- 
ning several processes concurrently. 

Server testing software often takes advantage of the multitasking capabilities of an OS by 
simulating each virtual user as a different process. An ordinary intemet user engaged in usual 
internet activity, such as web browsing, will probably use several threads that make network 
15 calls. Thus, it would be accurate to simulate such a user by a process that contains several 
threads, each creating a network connection to the server. 

Most commonly used operating systems are engineered to provide high performance for 
relatively small numbers of threads and processes. However, good server testing software should 
be able to simulate thousands of virtual users on a single machine. This would result in thou- 
20 sands of threads, which a commonly used OS may not be able to handle efficiently. 
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Summary of the Invention 

The present invention is directed to a system and method of increasing the efficiency of a 
program for testing servers or other multi-user computer systems. The increase in efficiency is 
achieved by generating the network traffic of many virtual users with only a single thread. That 
5 is made possible by the removal of blocking calls from the virtual user code and by the use of a 
worker thread model for processing the communication requests of the TCP/IP traffic of the vir- 
tual users. 

A blocking call is a function call made by a thread that invokes a function that is outside 
of that thread. The function in question is usually an OS function. Another aspect of the blocking 

10 call is that thread execution is halted until the function finishes execution. An example of a 
blocking call that often comes up in server testing programs is a TCP/IP call, designed to send or 
receive some information over a network, or establish a connection over a network. Blocking 
calls will usually take a long time to process, measured in CPU cycles, and thus cause inefficien- 
cies. The delay may be partially avoided by stopping the processing of threads that are waiting 

15 for blocking calls and instead continuing execution of threads that are not waiting, but even such 
switches have delay costs associated with them, and those costs are high in the server testing en- 
vironment, where thousands of threads may run on one machine. Without blocking calls, the pre- 
sent invention is able to create several connections per virtual user, while using a single thread to 
serve all connections of many virtual users. Since all connections of the virtual users are simu- 

20 lated on a single thread, all the virtual users as well as all other functionality of the server testing 
software can be implemented as multiple threads on a single process. This significantly decreases 
the number of context switches between different processes and threads. The fact that all the 

{M:\6896\1M096US1\00065378.DOC UDDIiaiiBlliiniflfliaailli} 

4 



network traffic of the virtual users is handled by a single thread, almost eliminates the connection 
related context switches. 

It should be noted that regardless of the fact that all the network connections of all virtual 
users are handled by a single thread, there still exist separate virtual user threads that run non- 
network virtual user simulation code, and supervise the network connections associated with 
each virtual user. 

The present invention uses a feature typically foimd in commonly used operating sys- 
tems. This feature is the non-blocking function call which has essentially the same functionality 
as the blocking function call, with one significant difference. When a thread calls the non- 
blocking function call the thread will not stop execution, as it would have if it had called the 
blocking function call. Instead, the thread will continue to execute, while the non-blocking func- 
tion call is pending or being processed somewhere else. When the non-blocking code finishes 
execution it will store a notification of completion as well as the associated return values of the 
call, if any, at a location that can be accessed by the thread. The thread must be able to handle 
such notifications. Non-blocking calls are sometimes referred to as asynchronous calls or over- 
lapped calls. 

According to die present invention, blocking calls are removed fi*om the user simulation 
code and replaced with non-blocking abstracted requests to a separate dedicated module. The 
module handles these requests, by initiating non-blocking or asynchronous TCP/IP calls. The 
module also receives and processes the asynchronous notifications indicating the completion of 
the calls. When the module determines a request is complete, it alerts the virtual user thread that 

{M:\6896\1M096US1\00065378.DOC lUIIIIIIinilllliafDOiliini} 

5 



1 



1 



initiated the request of the request's completion, and returns to that thread any return value that is 
available. 

Not all blocking calls must be removed. It is possible to program the module in such a 
way that it handles only certain blocking calls, preferably the ones that cause most performance 
difficulties. 

Brief Description of the Drawings 

The foregoing and other features of the present invention will be more readily apparent 
from the following detailed description and drawings of the illustrative embodiments of the in- 
vention wherein like reference mmibers refer to similar elements and in which: 

Figure 1 is a block diagram of a prior art implementation of a server testing system; 

Figure 2 is a block diagram of the request processor of the present invention; 

Figure 3 is a block diagram of the request processor of the present invention; and 

Figure 4 is a flowchart of the operation of the worker thread. 

Detailed Description of the Illustrative Embodiments. 

Figure 1 illustrates a prior art implementation of a server testing system and software. 
The testing software creates multiple virtual users 100. Each virtual user is simulated as a differ- 
ent process. Only two virtual users are shown, but there may be thousands. Each virtual user 
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process may contain several threads 102. Each of these threads includes blocking requests 106 
for network communication with the server 108 which create a TCP/IP connection 104 with the 
server 108. The TCP/IP connections are implemented through the operating system 110. Each 
thread 102 creates one connection, and thus, a virtual user creates several threads which in turn 
create several connections. 

A disadvantage of the prior art implementation is that effective server testing software 
must support thousands of virtual users, each virtual user containing several TCP/IP connections. 
If each TCP/IP connection is represented by a single thread, the number of threads may over- 
whehn the computer on which the server testing software is being run. Furthermore, blocking 
TCP/IP calls tend to cause context switches. The prior art implementation involves thousands of 
threads that each frequently make blocking TCP/IP calls. The resulting high rate of context 
switches is likely to significantly degrade performance. 

Figure 2 illustrates an embodiment of the server testing system of the present invention. 
In accordance with the present invention, a queue 202 is used to process thread requests. A queue 
is a data structure that operates on the FIFO (first in first out) principle. Items are removed fi-om 
the queue in the order in which they were placed. An item will not be removed until all items that 
were placed before it are removed. As used herein, the term enqueue will mean to add an item to 
the queue. The term dequeue will mean to remove an item from the queue. 

In this embodiment, a request processor 200 is used to handle all TCP/IP calls. The re- 
quest processor includes a completion queue 202 and a worker thread 204. Each virtual user is 
simulated as a separate single thread 201. The virtual users do not make TCP/IP requests directly 
to the operating system. Instead, they create request objects 210 and enqueue the request objects 
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in the completion queue 202 within the request processor 200, The worker thread 204 processes 
the request objects 210 in the completion queue 202, by making the actual TCP/IP calls 212 to 
the OS 110. The OS 110 in tum creates multiple TCP/IP connections 206 with tiie server 108. 
When a particular TCP/IP call is finished, the OS 110 notifies the request processor 200 by en- 
queueing a notification of completion in the completion queue 202. The notification of comple- 
tion contains a reference to the request object, for which the TCP/IP call was made. The worker 
thread 204 within the request processor 200 eventually dequeues the notification of completion. 
The worker thread 204 then processes the notification of completion. This processing will be de- 
scribed in more detail below. When all the TCP/IP operations associated with a request object 
are completed the worker thread 204 will notify the virtual user thread 201 that initiated the re- 
quest object, that the request is complete and will send the virtual user thread any retum informa- 
tion that is available. Return information may include various operation codes, error codes, or 
data received from the server. 

This embodiment of the invention utilizes an operating system feature such as, for exam- 
ple, the Microsoft Windows overlapped lO mechanism, which allows one thread to asynchro- 
nously support many concurrent TCP/IP connections. The way this feature is used can be seen in 
Figure 3, where the request processor is shown in greater detail. The completion queue 202, 
which is found, for example, in the Microsoft Windows OS, is used by the OS to store notifica- 
tions of completion of asynchronous operations. In this embodiment of the invention, the com- 
pletion queue also stores TCP/IP requests from the virtual users. 

When a virtual user thread 201 needs to make a TCP/IP request 300, the request is not 
sent directly to the OS, instead a request object 304 or 306 is created. The request object 304 or 
306 is a data structure which typically contains the following elements: information identifying 
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the virtual user thread that created that object, information that describes the nature of the re- 
quest j the state of the request and some space for return value information that is unused at this 
point. Once the request object is created, the virtual user thread enqueues the object in the com- 
pletion queue 202. 

The request objects 304, 306, etc. in the completion queue 202 are processed by the 
worker thread 204, which dequeues an item from the completion queue 202, processes it, and 
moves on to the next item. When the worker thread 204 dequeues a request object, it usually 
makes an asynchronous TCP/DP call The parameters of the TCP/IP call will depend on the in- 
formation in the request object. For example if the request object indicates that some data needs 
to be sent to the server, the worker thread will make the corresponding call to send that data. The 
TCP/IP call will also include a reference (usually a pointer) to the request object, for which the 
call is made. Because the TCP/IP call is asynchronous, the worker thread need not wait for the 
call to complete. Instead, the worker thread moves on to the next item in the queue. 

When the operating system 110 receives a TCP/IP call from the worker thread 204, it 
executes that call, by sending and/or receiving data from the server 108 through TCP/IP connec- 
tions 206. After the execution is complete the OS 110 enqueues a completion notification, con- 
taining a reference to the request object, into the completion queue 202. This completion notifi- 
cation also contains retum values and error codes if applicable. Examples of such completion no- 
tifications, enqueued by the OS, in Figure 3 are the "send finished notification" 308, 310 and the 
"receive finished notification" 312,314. 

In Figure 3 two virtual user threads 201 make requests 300 and 302, respectively, to 
communicate information with the server. Each virtual user creates a request object (304 and 

{M:\6896\lM096USl\()0065378.DOCaillDDfllliBlflIHIOininill} 

9 



306, respectively) and sends it to the completion queue 202. Each request object indicates that a 
send and receive is requested, and includes the data to be sent. The worker thread 204 dequeues 
these request objects according to the order in which they were enqueued and invokes a TCP/IP 
call for each object, by executing the appropriate OS function(s). When making a TCP/IP call, 
the worker thread 204 provides a reference to the request object with the TCP/IP call. When the 
send operations are completed the operating system 110 enqueues in the completion queue noti- 
fications of completion for each send operation. Each notification contains the reference to the 
request object, which was previously provided with the TCP/IP call. Subsequently the worker 
thread 204 dequeues completion notification 308, which contains a reference to request object 
304 (labeled in Figure 4 as "request object 1"). It discovers that the send for this object has been 
completed, so it changes the state of request object 304, to indicate that the particular send opera- 
tion has been completed, and it executes the next TCP/IP operation in the request object — the 
receive. The worker thread then proceeds to dequeue the next item in the queue 202 - comple- 
tion notification 310 (containing a reference to request object 306). The worker thread once 
again determines that the send for object 306 is completed so it changes the state of the refer- 
enced request object 306 to indicate the send was completed. By examining object 306 the 
worker thread determines that a receive still needs to be completed so it executes a receive for 
object 306 and goes back to the completion queue. 

After the operating system 110 processes each of these receive operations, it once again 
enqueues new completion notifications 312 and 314, that reference objects 304 and 306 respec- 
tively, back into the queue. The worker thread 204 dequeues completion notification 312 and by 
examining the request object 304, referenced in completion notification 3 12,- determines that the 
entire request is finished for that request object. The worker thread 204 then notifies the virtual 
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user thread that initiated request object 304 that the request has been completed and sends to this 
virtual user thread any return information. Retum information can be sent directly to the virtual 
user or by embedding that information in request object 304, which the virtual user can access. 
The worker thread then dequeues the next item on the queue, which is notification of completion 
5 314. The worker thread determines that request object 306, referenced in that notification, is 
complete as well, so it similarly notifies the virtual user thread that initiated that request of the 
completion and sends this virtual user thread any available retum information. It should be noted 
that objects 304 and 306 have their states changed by the worker thread as various TCP/IP opera- 
tions are performed. The request objects are generally used by the worker thread to store the state 
10 and other information associated with the corresponding requests. 

Figure 4 shows in detail the operation of the worker thread 204. First, at step 404 the 
worker thread 204 dequeues an item firom the completion queue. The dequeueing is performed 
using a blocking function call. If the completion queue is empty, the blocking fimction will wait 
until an item is enqueued in the completion queue. Thus the worker thread 204 will stop execu- 

15 tion if the completion queue 202 is empty, and not start again until a virtual user thread 201 or 
the OS 110 enqueues an item into the completion queue 202. Next, at step 406 the worker thread 
analyzes the item dequeued. The item can either be a request object sent from a virtual user 201, 
or a completion notification sent from the OS 1 10 that references a request object. If the item is a 
completion notification, at this step the worker thread updates the referenced request object to 

20 indicate that a TCP/IP operation has been completed and adds any data present in the notification 
of completion that may be relevant. Such data may include data sent from the server, error in- 
formation, time stamps, etc. At step 408 the worker thread processes the data in the object. The 
data referred to here may include the data that is meant to be sent to the server or the data that is 

{M:\6896\1M096US1\00065378.DOC llliniDllliliianiOli^^^ 

11 



o 



meant to be received from the server. The processing step determines whether a receive opera- 
tion has resulted in the receipt of all necessary data. If the object has already had a receive opera- 
tion perforaied on it during a previous pass through the worker thread, the worker thread will not 
treat the receive operation as fmished until it checks the data that was received and ascertains 
5 that all the data needed was received. If this is not the case, the worker thread will schedule an- 
other receive operation to be performed on that object. At step 410, the worker thread determmes 
if all the operations needed to fulfill the request for which the request object was created, are 
completed. If this is true, the worker thread moves on to step 412. At step 412, the worker thread 
notifies the virtual user that originated the request that the request has been completed and sends 

10 the virtual user the request object (or alternatively directly sends the virtual user all retumed 
data). After step 412, the worker thread retums to step 404 to dequeue another item from the 
queue. Altematively, if at step 410 it is determined that not all of the operations needed to fulfill 
the request for which the request object was created are completed, the worker thread moves on 
to step 414. At step 414 the worker thread asynchronously performs the next uncompleted opera- 

15 tion listed in the request object. This can be either sending or receiving data. In certain cases it 
may include other operations, such as opening a TCP/IP connection, closing a TCP/IP connec- 
tion, etc. After performing the next operation in the request object, the worker thread retums to 
step 404 to dequeue the next item from the queue. 

It should be noted that objects or other data structures may be represented by pointers, or 
20 other references to those objects or data structures, a technique commonly used in the art. Con- 
sequently, when reference is made to objects or other data structures being moved, sent or re- 
tumed, it does not necessarily imply that those data structures are moved in physical memory. 
The term operating system may include any library or module that provides TCP/IP or other 
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network functionality, such as the Microsoft Winsock hbraries. The illustrative embodiments use 
the TCP/IP protocol, but the present invention may be adapted to other network protocols. 

While the foregoing description and drawings represent illustrative embodiments of the 
present invention, it will be understood that various changes and modifications may be made 
5 without departing from the spirit and scope of the present invention. 
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